learning individual intrinsic reward
LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL. We compare LIIR with a number of state-of-the-art MARL methods on battle games in StarCraft II. The results demonstrate the effectiveness of LIIR, and we show LIIR can assign each individual agent an insightful intrinsic reward per time step.
Reviews: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
Overall, the method provided is a straightforward application of a known IR method to MARL, the results are promising and the writing is clear. As such, this work has limited novelty but provides good empirical contributions, though these too could be improved by considering more domains. A more detailed review of the paper, along with feedback and clarifications required are provided below. The work is motivated by the claim that providing individual IRs to different agents in a population (in a MARL setting) will allow diverse behaviours. The analysis at the end of the paper shows that a lot of the learned IR curves do overlap.
Reviews: LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
The paper extends the idea of learning intrinsic rewards to the centralized learning - decentralized execution, cooperative multi-agent setting. This setting has become popular in past years, as a setting that has high potential for real world applications and being amenable to progress towards tractable solutions. The approach presented by this work is easy to conceptually simple and well motivated. The authors empirically show that it outperforms existing state of the art approaches on challenging StarCraft benchmark tasks. Reviewers raised several concerns about the paper, including clarity (experiment details, precise description of the approach and distinction from existing approaches), and the need for further analysis.
LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL.
LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
Du, Yali, Han, Lei, Fang, Meng, Liu, Ji, Dai, Tianhong, Tao, Dacheng
A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL.